Modeling Functional Data without Gaussian Process Assumption

An NHANES application

Mingyuan Li

Introduction

  • Gaussian processes (GP) are commonly used to model functional data.
  • What if it is wrong? How would we test it (at least qualitatively)?
  • How can we analyze data without GP assumption?

NHANES Movement Data

  • 2011-2012 and 2013-2014 NHANES waves
  • Minute-level data from accelerometer labelled Monitor Independent Movement Summary (MIMS)
  • Bigger value = Higher movement
  • Participants: \(N=7578\), excluding incomplete data;1

Predictors

  • Continuous: Age, BMI, Poverty Income Ratio;
  • Binary: Sex, coronary heart disease history;
  • Categorical: Race, Education

Visualization of MIMS

Does transformation solve problems?

Summary before Transformation

Summary after Transformation

Motivation

  • Data can be skewed
  • Data can have non-Gaussian tails
  • Transformation might retain or even induce non-Gaussianity

General Model Setting

In typical functional regression tasks, the mean relationship between the outcome, \(Y(s)\), and covariate vector \(X\) are modeled under GP assumption

\[ Y_i(s)=\mu_i(s;X)+\epsilon_i(s)~,\epsilon_i(s) \sim \mathcal{GP}(0, V(\cdot,\cdot)) \] \(V\) is the covariance operator, and does NOT depend on \(X\).

  • How can we examine this assumption?
  • Direct testing for any finite subset of \(s\)?

Methodology

We approach this using mixed-effect modeling with splines: \[ \small \begin{aligned} Y_i(s)&=\overset{\text{(Fixed Effect)}}{\mu_i(s)}&+&\overset{\text{(Random Effect)}}{b_i(s)}&+&\overset{\text{(Residual Noise)}}{\epsilon_i(s)}\\ Y_i(s)&=\sum\limits_{p=1}^PX_{ip}A_p(s)&+&\sum\limits_{j=1}^J\xi_{ij}\phi_j(s)&+&~~~~~~\epsilon_i(s) \end{aligned} \]

\(~~~\small Y_i(s)=X_i\cdot\) + \(~~~\small \xi_i\cdot\) + \(~~~\)

Violation of Assumptions in MIMS

  • Not the ellipse (potato) we should be seeing

Violation of Assumptions in MIMS

How Scores Affect Outcome

\[ Y_i(s)=\sum\limits_{p=1}^PX_{ip}A_p(s)+\sum\limits_{j=1}^J\xi_{ij}\phi_j(s)+\epsilon_i(s) \] When \(\xi_{ij}\) has variance dependent on \(X_i\), so does \(Y_i(t)\): \[ \begin{aligned} \small Cov[Y_i(s_1),Y_i(s_2)|X_i]&=\sum\limits_{1\le j_1,j_2\le J}Cov(\xi_{ij_1},\xi_{ij_2}|X_i)\phi_{j_1}(s_1)\phi_{j_2}(s_2)\\ &+Var[\epsilon_i(s_1)]\cdot\mathbb{I}[s_1=s_2] \end{aligned} \]

How Scores Affect Outcome

\[ Y_i(s)=\sum\limits_{p=1}^PX_{ip}A_p(s)+\sum\limits_{j=1}^J\xi_{ij}\phi_j(s)+\epsilon_i(s) \] When \(\xi\) has nonzero skewness/excess kurtosis, so would \(Y_i(t)\): \[ \begin{aligned} &E\{[Y_i(s)-EY_i(s)]^3|X_i\}\\=&\sum\limits_{1\le j_1,j_2,j_3\le J}E[\xi_{ij_1}\xi_{ij_2}\xi_{ij_3}|X_i]\phi_{j_1}(s)\phi_{j_2}(s)\phi_{j_3}(s) \end{aligned} \]

Modeling Random Effect Terms

We choose to model \(Var[\xi_{ij}]=E[\xi_{ij}^2]\) using log-linear model: \[ \log E[\xi_{ij}^2]=X_i'\beta_j \] and \(\mathrm{Cor}((\xi_{i1},\ldots,\xi_{iJ})')=C\), which is irrelevant of \(X\).

Now the covariance operator for \(Y_i(t)\) becomes \[ \tiny \begin{aligned} \Sigma_i(s_1,s_2)&=\sum\limits_{j_1=1}^J\sum\limits_{j_2=1}^J(e^{X_i'\boldsymbol{\beta}_{j_1}})^{1/2}(e^{X_i'\boldsymbol{\beta}_{j_2}})^{1/2}C_{j_1,j_2}\phi_{j_1}(s_1)\phi_{j_2}(s_2)\\ &+\sigma^2_\epsilon(s_1)\mathbb{I}(s_1=s_2) \end{aligned} \] - \(Y_i(t)|X_i\) can be correlated even if \(C\) is diagonal.

Parameter Estimation

Parameter Estimation: Fixed Effect

One natural way is to do spline regression \(\small A_p(s)=\sum\limits_{l=1}^L\gamma_{pl}B_l(s)\).

  • Problem: Requires storing design matrix with \(NM\) rows and \(pL\) columns:

\[ Y_i(s_j)\sim\sum\limits_{p,l}\gamma_{pl}[X_{ip}B_l(s_j)] \]

In our case: \(N\approx7000,M=1440,p=13,L=24\)

Parameter Estimation: Fixed Effect

Solution: Fast Univariate Inference (FUI)1

1-1: Perform OLS regression first using epoch-level data: \[ \color{grey}{Y_i(s_j)=}Y_{ij}\sim\sum\limits_{p}X_{ip}\tilde A_{pj} \]

1-2: Apply any smooth regression technique (e.g. LOESS, GAM) on \(\tilde A_{p1},\cdots\tilde A_{pM}\) to get a smooth estimate \(\hat A_p(s)\).

Visualizing FUI

One-step Residual

\[ \small \begin{aligned} Y_i(t)&=\color{grey}{\sum\limits_{p=1}^PX_{ip}A_p(s)}+{\sum\limits_{j=1}^J\xi_{ij}\phi_j(s)}+{\epsilon_i(s)} \end{aligned} \]

Parameter Estimation: Random Effect

\[ \small \begin{aligned} Y_i(t)&=\color{grey}{\sum\limits_{p=1}^PX_{ip}A_p(s)}+{\sum\limits_{j=1}^J\xi_{ij}\phi_j(s)}+{\epsilon_i(s)} \end{aligned} \]

2-1: Regress residuals \(\small\hat r_i(s)=Y_i(s)-\sum\limits_{p}X_{ip}\hat A(s)\) on \(\small\phi_1(s),\cdots,\phi_J(s)\) with penalty to get score estimates \(\hat\xi_{ij}\). Record effective degrees of freedom \(edf_i\).

2-2: Perform GLM with quasi-Poisson model \[ \small \color{grey}{Var[\hat{\xi_{ij}}]\approx}E[\hat\xi_{ij}^2]=\exp(X_i'\beta_j);~~~ Var[\hat\xi_{ij}^2]\propto E[\hat\xi_{ij}^2] \]

Parameter Estimation: Variance

\[ \small \begin{aligned} Y_i(t)&=\color{grey}{\sum\limits_{p=1}^PX_{ip}A_p(s)}+{\sum\limits_{j=1}^J\xi_{ij}\phi_j(s)}+{\epsilon_i(s)} \end{aligned} \]

2-3: Recover \(C\) from standardized scores \(\hat{\xi_{ij}^*}=\hat{\xi_{ij}}(\exp(X_i'\beta_j))^{-1/2}\).

2-4: Recover \(\sigma_\epsilon^2(s)\) via smoothing the squared-residual \[ \hat\epsilon_i(s)^2=\frac{[\hat r_i(s)-\sum_j\hat\xi_{ij}\phi_j(s)]^2}{\mathbf{1-edf_i/M}} \]

Parameter Estimation: Variance

We fit Gamma GAMs on \(\hat\epsilon_i(s)\) against \(s\): \[ \small\hat\epsilon_i(s)\sim Gamma(\mathrm{shape}=\alpha,\mathrm{mean}=\sigma_\epsilon^2(s)) \] To speed up the calculation, we use the following trick1 \[ \small E[\ln\hat\epsilon_i^2(s)]=\psi(\alpha)+\ln\frac{\sigma^2_\epsilon(s)}{\alpha},~ Var[\ln\hat\epsilon_i^2(s)]=\psi'(\alpha) \]

  • \(\alpha\) is estimated via Method of Moments
  • \(\sigma^2_\epsilon(s)\) is estimated via Gaussian GAM

Parameter Estimation: Variance

\[ \small E[\ln\hat\epsilon_i^2(s)]=\psi(\alpha)+\ln\frac{\sigma^2(s)}{\alpha}, Var[\ln\hat\epsilon_i^2(s)]=\psi'(\alpha) \]

Note: If we average \(\hat\epsilon_i(s)\) at each time \(s\) first then smooth out the averages, we tend to underestimate the standard error of the noise variance estimator (by about 15%)

Finding Higher-order Moments

\[ \tiny \begin{aligned} Y_i(t)&=\color{grey}{\sum\limits_{p=1}^PX_{ip}A_p(s)}+{\sum\limits_{j=1}^J\xi_{ij}\phi_j(s)}+{\epsilon_i(s)} \end{aligned} \]

2-5: Perform linear regression on: \[ \tiny \begin{aligned} Skew(\hat\xi_{ij})\approx E[\tilde{\xi_{ij}^*}^3]&=X_i'\delta_{\text{skew},j}\\ Kurt(\hat\xi_{ij})\approx E[\tilde{\xi_{ij}^*}^4-3]&=X_i'\delta_{\text{kurt},j} \end{aligned} \] and other cross-products, such as \(\tilde{\xi_{i1}^*}^2\tilde{\xi_{i2}^*}\).1

2-6: Recover skewness/kurtosis of \(Y_i(s)|X_i\) using 2nd-4th moments of scores

Interval Estimation

  • Standard error estimated via nonparametric bootstrap, i.e. sample individuals w/ replacement

  • Pointwise Wald intervals: \(\pm 1.96\times SE\)

  • Correlation and Multiplicity Adjusted(CMA) interval1: \(\pm Q\times SE\)

Should we use symmetric CI when we no longer assume symmetry in the data?

CMA Interval with Some Asymmetry

  1. Calculate the pointwise sample variance \(\small\hat D(t)=\hat{Var}(\hat f^*_i(t))\)
  2. Calculate the maximum and minimum Z-score: \(\small M_i=\max_t\frac{\hat f^*_i(t)-\bar f(t)}{\hat D(t)^{1/2}},m_i=\min_t\frac{\hat f^*_i(t)-\bar f(t)}{\hat D(t)^{1/2}}\)
  3. Let \(\small Q_U\) be \(\small 97.5\%\) quantile of \(\small -m_1,-m_2,\cdots,-m_B\)
  4. Let \(\small Q_L\) be \(\small 2.5\%\) quantile of \(\small -M_1,-M_2,\cdots,-M_B\)
  5. Calculate intervals \(\small[\hat f_0(t)-Q_L\hat D(t)^{1/2},\hat f_0(t)+Q_U\hat D(t)^{1/2}]\)

Current Results

Fixed Effect

\(\tiny Y_i(t)={\sum\limits_{p=1}^PX_{ip}{A_p(s)}}+\color{grey}{\sum\limits_{j=1}^J\xi_{ij}\phi_j(s)}+\color{grey}{\epsilon_i(s)}\)

1

Variance Component: Noise

\[ \tiny \begin{aligned} Y_i(t)&=\color{grey}{\sum\limits_{p=1}^PX_{ip}{A_p(s)}}+\color{grey}{\sum\limits_{j=1}^J\xi_{ij}\phi_j(s)}+{\epsilon_i(s)} \end{aligned} \]

Variance Component: Full Residual

\(\tiny Y_i(t)=\color{grey}{\sum\limits_{p=1}^PX_{ip}{A_p(s)}}+{\sum\limits_{j=1}^J\xi_{ij}\phi_j(s)}+{\epsilon_i(s)}\)

1

Comparing Variability between Subgroups

Variance Component: Correlation

Variance Component: Correlation

Skewness within Subgroups

1

Kurtosis within Subgroups

1 2

Simulation

Work in Progress

Data Generation

\(Y_i(s)=X_i\cdot\) + \(\xi_i\) +

  • Fit the original NHANES data with fewer covariate and smooth basis
  • Approximate the fixed effect and noise variance using piecewise simple functions
  • Approximate the score distribution using marginal inverse-quantile transformation of an MVN

Synthetic Data: Parameters

\[ \tiny \begin{aligned} X_1&=1,X_2\sim Uniform(-30,30),\\ X_3&\sim Bernoulli(0.5),X_4\sim Bernoulli(0.3)\\ A(s)&=(A_1(s),A_2(s),A_3(s),A_4(s))\text{ where }\\ A_1(s)&=2.5-1.8\exp\{2(1-\cos\frac{(s-200)\pi}{720})\}\\ A_2(s)&=\begin{cases} 0.005-0.01(1-\cos\frac{(s-480)\pi}{960})&480\le s\le 1440\\ 0.005-0.01(1-\cos\frac{(s-480)\pi}{480})&0\le s< 480 \end{cases}\\ A_3(s)&=\begin{cases} 0.1-0.125(1-\cos\frac{(s+240)\pi}{540})&0\le s<300\\ 0.1-0.125(1-\cos\frac{(s-600)\pi}{300})&300\le s<600\\ 0.1&600\le s<1200\\ 0.1-0.125(1-\cos\frac{(s-1200)\pi}{1740})&1200\le s<1440 \end{cases}\\ A_4(s)&=\begin{cases} 0.02-0.125(1-\cos\frac{(s-180)\pi}{1200})&0\le s<180\\ 0.02-0.125(1-\cos\frac{(s-180)\pi}{240})&180\le s<420\\ -0.23+0.125(1-\cos\frac{(s-420)\pi}{1200})&420\le s<1440 \end{cases}\\ \sigma_\epsilon^2(s)&=0.1+0.35\exp[-4(1-\cos\frac{(s-360)\pi}{720})]+0.25\exp[-4(1-\cos\frac{s\pi}{720})] \end{aligned} \]

Synthetic Data: Score Distribution

  • Sample \(\tiny \gamma\sim MVN(0,\Sigma)\) where \[\tiny\Sigma=\begin{pmatrix} 1& -0.4& 0.7& -0.4& -0.2 \\ -0.4& 1& -0.4& 0.5& -0.1 \\ 0.7& -0.4& 1& -0.5& 0.1 \\ -0.4& 0.5& -0.5& 1& -0.5\\ -0.2& -0.1& 0.1& -0.5& 1\\ \end{pmatrix}\]

  • Marginally transform \(\gamma_1,\gamma_2,\gamma_3,\gamma_5\) to Gamma distribution with shape parameters 3, 50, 20 and 4 and standardize to mean 0 and variance 1

  • Flip the sign of \(\gamma_1,\gamma_3\)

Synthetic Data: Score Distribution

  • Let \[ \tiny\xi_j=\xi_j^*\exp(\frac{1}{2}\sum\limits_{i=1}^4b_{ij}x_i) \] where \[ \tiny B=(b_{ij})_{4\times 5}=\begin{pmatrix} -0.2&-0.009&-0.04&-0.07\\ -0.01&-0.05&-0.1&-0.06\\ -0.03&0.2&0.1&-0.005\\ 0.01&-0.8&-0.006&-0.2\\ -1&-0.003&0.02&-0.02 \end{pmatrix} \]

Synthetic Data Skewness/Kurtosis

Fixed Effect Estimate

Noise Variance Estimate

Coverage % (Wald; CMA) N=100 N=300 N=1000
Fixed Effect \(A_2(t)\)
freq=1min 93; 88 94; 87 96; 94
freq=3min 93; 86 92; 87 94; 94
freq=10min 95; 94 95; 92 95; 92
Noise Variance
freq=1min 95; 89 95; 96 95; 91
freq=3min 96; 92 95; 93 95; 93
freq=10min 95; 88 94; 91 94; 93

Analysis to be done next

  • Examining behavior of conditional variance, skewness, kurtosis estimators via simulation
  • Find ways to construct prediction intervals given \(X_i\), accounting for non-Gaussian properties
  • Identify outliers and classify/cluster them based on timing and extent of exceeding the prediction band

References

Cui, Erjia, Leroux, Andrew, Smirnova, Ekaterina and Crainiceanu, Ciprian M. (2021). Fast univariate inference for longitudinal functional models.

Crainiceanu, C.M., Goldsmith, J., Leroux, A. and Cui, E. (2024). Functional Data Analysis with R

Backenroth, Daniel, Goldsmith, Jeff, Harran, Michelle D., Cortes, Juan C., Krakauer, John W. and Kitago, Tomoko. (2018). Modeling motor learning using heteroscedastic functional principal components analysis

References

Cui, E., Leroux, A., Smirnova, E., & Crainiceanu, C. M. (2021). Fast Univariate Inference for Longitudinal Functional Models.

Reiss, Philip T., Huang, Lei and Mennes, Maarten. (2010). Fast function-on-scalar regression with penalized basis expansions.

Yi Zhao, Brian S Caffo, Xi Luo, For the Alzheimer’s Disease Neuroimaging Initiative, Longitudinal regression of covariance matrix outcomes